Automating the Development of Syntax Tree Generators for an Evolving Language
نویسندگان
چکیده
This paper describes an Ei el system for rapid testing of grammars Grammars are de ned in an extended BNF notation that allows actions on the parse tree nodes to be de ned as additional an notations The actions are high level descriptions not procedural code to transform a parse tree into a syntax tree A parser producing a syntax tree for a language sentence can be automatically generated from the annotated grammar as a set of classes The object oriented environment permits a much higher degree of separation between syn tax and semantics than is possible with tradi tional approaches Structural grammar changes can be made without a ecting already developed semantic routines This gives a great advantage for early compiler implementations when the lan guage syntax is still evolving Introduction When a new language of some complexity is cre ated this will often force the development of a number of successive compiler versions to support its evolving syntax and semantics Although auto matic parser generators like yacc Joh can ease the developer s burden a great deal they have a number of limitations that make compiler mainte nance hard for an evolving language Since the syntactic and semantic elements are mixed in a yacc speci cation a large amount of recoding will often be needed as a result of mere structural changes to the language grammar The lack of separation between syntax and semantics also makes it hard to write several processors for the same language such as compiler static checker pretty printer etc without considerable duplication of e ort and risk of inconsistencies The abstraction support of a good object oriented language is what is needed to overcome some of the limitations above This paper describes a parser generator writ ten in Ei el Mey and a high level notation for specifying actions to transform a parse tree into a syntax tree The notation is based on basic graph transformations on trees such as adding and re moving vertices contracting edges and rearrang ing vertices Given a grammar with annotated transformations the generated parser will recog nise a syntactically correct sentence of the lan guage and deliver an abstract syntax tree which can then be further processed by semantic rou tines If the set of keywords and operators are fairly stable during development of the language then the generated syntax trees are expected to be at least as stable regardless of structural grammar changes This makes it possible to work e ciently on the evaluation scheme before the language has been frozen Since the syntax transformation directives are high level the task of keeping them consistent with the evolving language constructs becomes easy This may be contrasted with yacc where the actions to build a syntax tree have to be re coded in C each time the grammar is restructured The parser generator was designed to support the development of an analytic query language for geographical databases called GeoSAL SZ The work was carried out as part of a joint project between the National Defense Research Establish ment NobelTech Systems AB and Ericsson Radio Systems AB The project is part of a national re search and development program in information technology The Ei el environment Among the attractive features of the Ei el distri bution from Interactive Software Engineering Inc ISE are substantial class libraries supporting basic data structures lexical analysis and pars ing MN Thus for the most part there is no need for the user to implement common data ab stractions such as lists hash tables trees stacks and queues since these are directly available and easily tailorable through subclassing The lexical library supports grammars of reg ular expressions and provides approximately the facilities of lex LS Instead of the preprocessor approach of lex the lexical classes contain oper ations to generate a lexical scanner from descrip tions in a le and store it in internal object format in another le for subsequent use The parsing library classes map the constructs of an arbitrary LL grammar and provide op erations for recursive descent parsing of the cor responding language HM This is di erent from yacc which provides bottom up parsing of LALR languages for an overview of compiling techniques see for example FL The family of languages that can be expressed with LL gram mars is somewhat smaller than the correspond ing family for LALR grammars On the other hand most well designed programming languages can be turned into LL form Rare exceptions such as the C Pascal dangling else can usually be taken care of by allowing the grammar to be ambiguous and then let the parser apply disam biguating rules This technique is also employed by yacc on LALR grammars Moreover LL parsing has the advantage of much easier error reporting and recovery for the compiler compared to the LALR technique So it suited our needs well since we wanted to reduce the e ort of syntax control and spend as much time as possible on the semantics of the language under development Object oriented parsing The idea underlying the Ei el parsing library which is a direct application of the phrase syntax directed compiling is to model each production of a grammar by a separate class This object oriented approach to parsing has several advan tages Encapsulating each syntactic construct as an independent unit makes it easy to build an ab stract syntax tree which can then be traversed and decorated in successive passes Di erent se mantic actions can be applied to the same syn tax tree thus permitting several tools to share the same syntactic representation Classes with algorithms for semantic analysis and evaluation can be developed independently without having to rewrite the code each time super cial changes are made to the language grammar The Ei el parsing library restricts the produc tions to three kinds of construct aggregate choice and sequence their exact meaning will be de scribed in the next section Each construct has a library class to inherit from which implements the parsing details applicable to constructs of that type All the user needs to do is ll in the con crete parts in the class text for each grammar con struct Although this is simple enough to do for a few classes writing classes by hand for the gram mar of a realistic language is not feasible partic ularly for an evolving grammar where they would constantly have to be rewritten This renders the parsing library next to useless unless complemented by a program that can gen erate the classes needed directly from a grammar description Currently there is no such genera tor in the Ei el distribution but ISE has made one for internal use named yoocc Yes an Object Oriented Compiler Compiler in homage of yacc which is planned to be a product but not yet re leased HM By courtesy of ISE we were al lowed to reuse a set of classes for Ei el source code generation previously developed for yoocc when building our own parser generator
منابع مشابه
A Mde Approach for Language Engineering
Many development tools of modern Integrated Development Environments (IDEs) make an intensive use of abstract syntax tree (AST) representations of the software. This is the case of refactors, code formatters, or content assistants, among others. Such AST is usually an instance of an object oriented abstract syntax model. We propose to center the attention of Language Engineering (LE) on this mo...
متن کاملDSL development based on target meta-models. Using AST transformations for automating semantic analysis in a textual DSL framework
This paper describes an approach to creating textual syntax for Domain-Specific Languages (DSL). We consider target meta-model to be the main artifact and hence to be developed first. The key idea is to represent analysis of textual syntax as a sequence of transformations. This is made by explicit operations on abstract syntax trees (ATS), for which a simple language is proposed. Text-to-model ...
متن کامل5 Future Work
14 shift in organizational thinking to automate the development of software that is presently coded by hand. However, only until software development is automated will major benefits in productivity, quality, reliability , and performance be possible. Designing components and building generators is difficult. We have made significant progress in understanding how components can be designed and ...
متن کاملThe ModelCC Model-Based Parser Generator
Formal languages let us define the textual representation of data with precision. Formal grammars, typically in the form of BNF-like productions, describe the language syntax, which is then annotated for syntax-directed translation and completed with semantic actions. When, apart from the textual representation of data, an explicit representation of the corresponding data structure is required,...
متن کاملTypesafe Code Reuse Across ASTs via Code Generation
Writing data structures for abstract syntax trees (ASTs) in a conventional OO programming language is tedious and error-prone. Hence, programmers often use AST generators to generate OO code from a higher-level description. This article argues that the existing AST generators do not provide good support for programs that manipulate several similar structural variations of an AST. Using a conven...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1992